Covid19 Japanが独自に収集している陽性者単位のデータ(個票データ)。ソースとデータは全てGitHubにて公開されており、データはJSON形式。「レコード数 \(\neq\) 累計陽性者数」であることに注意。
Covid19 JapanがGitHubで公開しているデータは前述のようにJSON形式であり、最新データはlatest.jsonファイルで示されている。このため、読み込む際はひと工夫必要。
陽性者単位の個票データ。
path <- "https://raw.githubusercontent.com/reustle/covid19japan-data/master/docs/patient_data/"
df <- path %>%
paste0("latest.json") %>%
readr::read_lines() %>%
paste0(path, .) %>%
jsonlite::fromJSON()
df
死亡者数や重症者数などの推移データはsummaryフォルダ内のJSON形式ファイルにまとめられている。summaryフォルダの他にsummary_minフォルダというフォルダがあるが、summary_minフォルダ内のJSONファイルは単に改行を省略して小さくしたファイル。
path <- "https://raw.githubusercontent.com/reustle/covid19japan-data/master/docs/summary/"
df_s <- path %>%
paste0("latest.json") %>%
readr::read_lines() %>%
paste0(path, .) %>%
jsonlite::fromJSON()
df_s %>% summary()
## Length Class Mode
## prefectures 27 data.frame list
## regions 12 data.frame list
## daily 37 data.frame list
## updated 1 -none- character
要約すると分かるように3つのデータフレーム(都道府県単位、八地方区分単位、日次)と一つのベクトル(更新日時)から構成されている。
更新日次時点における都道府県単位での累積値。陽性者・死亡者などの時系列集計データはネストで格納されている。
厚生労働省のオープンデータが集計から除いている空港検疫・ダイヤモンドプリンセス・長崎クルーズ船・その他を含めて全51区分。
df_s$prefectures
更新日次時点における八地方区分単位での累積値。陽性者・死亡者などの時系列集計データは都道府県単位と同様にネストで格納されている。
ただし、確認した時点(2020/11/3)では、時系列集計値の合計と累積値が一致しない。
df_s$regions
個票データを日次で集計したもの。累積値の他に移動平均も含まれているが、日付を見れば分かる通り暗黙の欠落を含んでいる。
df_s$daily
集計データの更新日時。
df_s$updated
## [1] "2020-11-06T21:31:57+09:00"
最初にデータがどのようになっているか確認する。これには要約に便利なskimrパッケージを用いる。
df %>%
skimr::skim()
| Name | Piped data |
| Number of rows | 107792 |
| Number of columns | 23 |
| _______________________ | |
| Column type frequency: | |
| character | 19 |
| logical | 3 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| patientId | 0 | 1.00 | 1 | 8 | 0 | 106109 | 0 |
| dateAnnounced | 0 | 1.00 | 10 | 10 | 0 | 283 | 0 |
| gender | 16617 | 0.85 | 1 | 1 | 0 | 2 | 0 |
| detectedPrefecture | 0 | 1.00 | 3 | 15 | 0 | 49 | 0 |
| patientStatus | 103813 | 0.04 | 8 | 23 | 0 | 8 | 0 |
| notes | 55914 | 0.48 | 1 | 270 | 0 | 49116 | 1 |
| mhlwPatientNumber | 107343 | 0.00 | 1 | 11 | 0 | 434 | 0 |
| prefecturePatientNumber | 14347 | 0.87 | 5 | 20 | 0 | 93436 | 0 |
| prefectureSourceURL | 76488 | 0.29 | 5 | 224 | 0 | 3439 | 0 |
| residence | 24617 | 0.77 | 1 | 38 | 0 | 1422 | 0 |
| sourceURL | 637 | 0.99 | 1 | 239 | 0 | 8095 | 0 |
| relatedPatients | 97396 | 0.10 | 2 | 259 | 0 | 6345 | 0 |
| knownCluster | 105310 | 0.02 | 3 | 88 | 0 | 229 | 0 |
| detectedCityTown | 81793 | 0.24 | 2 | 22 | 0 | 663 | 0 |
| cityPrefectureNumber | 82058 | 0.24 | 1 | 34 | 0 | 25725 | 2 |
| citySourceURL | 95960 | 0.11 | 9 | 317 | 0 | 3637 | 0 |
| deceasedDate | 105972 | 0.02 | 10 | 10 | 0 | 233 | 0 |
| deceasedReportedDate | 106578 | 0.01 | 10 | 62 | 0 | 204 | 0 |
| deathSourceURL | 106722 | 0.01 | 14 | 123 | 0 | 651 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| confirmedPatient | 0 | 1 | 0.98 | TRU: 106108, FAL: 1684 |
| charterFlightPassenger | 107778 | 0 | 1.00 | TRU: 14 |
| cruisePassengerDisembarked | 107781 | 0 | 1.00 | TRU: 11 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| ageBracket | 0 | 1 | 31.96 | 23.8 | -1 | 20 | 30 | 50 | 100 | ▅▇▅▂▁ |
元がJSON形式なので、読み込んだ直後は殆どの変量(フィーチャー)が文字型になっていることが分かる。また、意外と欠損が多いことも分かる。
各変量(フィーチャー)を適切な形式に変換し、地域区分でも分析できるように都道府県データと結合する。
x <- df %>%
dplyr::select(patientId, date = dateAnnounced, gender,
pref = detectedPrefecture, patientStatus, knownCluster,
confirmedPatient, charterFlightPassenger,
cruisePassengerDisembarked, ageBracket,
deceasedDate, deceasedReportedDate) %>%
dplyr::filter(confirmedPatient == TRUE) %>%
dplyr::mutate(date = lubridate::as_date(date),
gender = forcats::as_factor(gender),
patientStatus = forcats::as_factor(patientStatus),
cluster = dplyr::if_else(!is.na(knownCluster), TRUE, FALSE),
ageBracket = forcats::as_factor(ageBracket),
deceasedDate = lubridate::as_date(deceasedDate),
deceasedReportedDate = lubridate::as_date(deceasedReportedDate)) %>%
dplyr::left_join(prefs, by = c("pref" = "pref")) %>%
dplyr::select(-`推計人口`) %>%
dplyr::rename(Pref = `都道府県`, region = `八地方区分`)
x
変換結果を要約してみると
x %>%
skimr::skim()
| Name | Piped data |
| Number of rows | 106108 |
| Number of columns | 19 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| Date | 3 |
| factor | 9 |
| logical | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| patientId | 0 | 1.00 | 1 | 8 | 0 | 106108 | 0 |
| pref | 0 | 1.00 | 3 | 15 | 0 | 49 | 0 |
| knownCluster | 103655 | 0.02 | 3 | 88 | 0 | 227 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| date | 0 | 1 | 2020-01-15 | 2020-11-06 | 2020-08-14 | 283 |
| deceasedDate | 105729 | 0 | 2020-02-13 | 2020-10-17 | 2020-05-08 | 150 |
| deceasedReportedDate | 105778 | 0 | 2020-02-13 | 2020-10-17 | 2020-05-16 | 131 |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| gender | 15956 | 0.85 | FALSE | 2 | M: 50630, F: 39522 |
| patientStatus | 103575 | 0.02 | FALSE | 8 | Hos: 1261, Dec: 371, Hom: 315, Dis: 283 |
| ageBracket | 0 | 1.00 | FALSE | 13 | 20: 24761, -1: 16055, 30: 15650, 40: 12963 |
| pcode | 1232 | 0.99 | FALSE | 47 | 13: 32165, 27: 13485, 14: 9138, 23: 6625 |
| Pref | 1232 | 0.99 | FALSE | 47 | 東京都: 32165, 大阪府: 13485, 神奈川: 9138, 愛知県: 6625 |
| region | 1232 | 0.99 | FALSE | 8 | 関東地: 54897, 近畿地: 21237, 九州地: 11128, 中部地: 10320 |
| 広域圏 | 8361 | 0.92 | FALSE | 8 | 首都圏: 55126, 近畿圏: 20651, 中部圏: 8994, 九州圏: 7669 |
| 通俗的区分 | 1232 | 0.99 | FALSE | 11 | 関東: 54897, 関西: 20651, 東海: 8644, 九州: 7669 |
| fct_pref | 1232 | 0.99 | FALSE | 47 | Tok: 32165, Osa: 13485, Kan: 9138, Aic: 6625 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| confirmedPatient | 0 | 1 | 1.00 | TRU: 106108 |
| charterFlightPassenger | 106094 | 0 | 1.00 | TRU: 14 |
| cruisePassengerDisembarked | 106097 | 0 | 1.00 | TRU: 11 |
| cluster | 0 | 1 | 0.02 | FAL: 103655, TRU: 2453 |
文字型を因子型に変換するだけでも大まかな傾向が見えるようになる。例えば
ことが読める。
patientStatusは以下の通りで、ほぼ更新されていないのと思われる。死者数などの推移を見る場合はサマリデータを使った方がいい。
x %>%
dplyr::group_by(patientStatus) %>%
dplyr::summarise(n = n()) %>%
dplyr::ungroup() %>%
dplyr::mutate(Japanese = c("回復", "入院中", "退院済", "死亡", "詳細不明",
"重症", "自宅療養", "ホテル療養", NA))
地方区分で比較すると都道府県と同様に人口の多い関東、近畿はともかく、九州、北海道の陽性者率が高いことが分かる。
region <- prefs %>%
dplyr::group_by(`八地方区分`) %>%
dplyr::summarise(population = sum(`推計人口`)) %>%
dplyr::rename(region = `八地方区分`)
x %>%
dplyr::group_by(region) %>%
dplyr::summarise(n = n()) %>%
dplyr::left_join(region, by = c("region" = "region")) %>%
dplyr::select(region, n, population) %>%
dplyr::mutate(rate = round(n / population, 2))
都道府県別の総陽性者数と人口千人あたりの陽性者率。
x %>%
dplyr::group_by(Pref) %>%
dplyr::summarise(n = n()) %>%
dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>%
dplyr::select(Pref, n, population = `推計人口`) %>%
dplyr::mutate(rate = round(n / population, 2))
上位10県を累計人数と人口千人あたりの陽性者数で比べてみる。
x %>%
dplyr::group_by(Pref) %>%
dplyr::summarise(n = n()) %>%
dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>%
dplyr::select(Pref, n, population = `推計人口`) %>%
dplyr::mutate(rate = round(n / population, 2)) %>%
dplyr::slice_max(order_by = n, n = 10) %>%
dplyr::rename(`累計陽性者数` = n, `推計人口[千人]` = population, `率` = rate)
x %>%
dplyr::group_by(Pref) %>%
dplyr::summarise(n = n()) %>%
dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>%
dplyr::select(Pref, n, population = `推計人口`) %>%
dplyr::mutate(rate = round(n / population, 2)) %>%
dplyr::slice_max(order_by = rate, n = 10) %>%
dplyr::rename(`累計陽性者数` = n, `推計人口[千人]` = population, `率` = rate)
累計の陽性者数は、ほぼ、人口に比例しているが、一部の県での感染率が高いことが分かる。
陽性者数、前日、累計、移動平均。
x_by_all <- x %>%
dplyr::group_by(date) %>%
dplyr::filter(!is.na(Pref)) %>%
dplyr::summarise(n = n()) %>%
tidyr::complete(date = seq.Date(from = min(date), to = max(date), by = "day"),
fill = list(n = 0L)) %>%
dplyr::mutate(diff = lagdiff(n), cum = cumsum(n),
ma7 = zoo::rollmeanr(n, k = 7L, na.pad = TRUE))
x_by_all
x_by_region <- x %>%
dplyr::filter(!is.na(Pref)) %>%
dplyr::group_by(date, region) %>%
dplyr::summarise(n = n()) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = region, values_from = n, values_fill = 0L) %>%
tidyr::complete(date = seq.Date(from = min(date), to = max(date), by = "day")) %>%
tidyr::pivot_longer(cols = -date, names_to = "region", values_to = "n") %>%
tidyr::replace_na(replace = list(n = 0L)) %>%
dplyr::group_by(region) %>%
tidyr::nest() %>%
dplyr::mutate(diff = purrr::map(data, ~ lagdiff(.$n)),
cum = purrr::map(data, ~ cumsum(.$n)),
ma7 = purrr::map(data, ~ ma7(.$n))) %>%
tidyr::unnest() %>%
dplyr::left_join(prefs %>% dplyr::distinct(`八地方区分`), .,
by = c("八地方区分" = "region")) %>%
dplyr::mutate(region = forcats::fct_inorder(`八地方区分`)) %>%
dplyr::select(date, region, n, diff, cum, ma7) %>%
dplyr::arrange(date)
x_by_region
陽性者数、前日差、累計、移動平均。
x_by_prefs <- x %>%
dplyr::filter(!is.na(Pref)) %>%
dplyr::group_by(date, Pref) %>%
dplyr::summarise(n = n()) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = Pref, values_from = n, values_fill = 0L) %>%
tidyr::complete(date = seq.Date(from = min(date), to = max(date), by = "day")) %>%
tidyr::pivot_longer(cols = -date, names_to = "Pref", values_to = "n") %>%
tidyr::replace_na(replace = list(n = 0L)) %>%
dplyr::group_by(Pref) %>%
tidyr::nest() %>%
dplyr::mutate(diff = purrr::map(data, ~ lagdiff(.$n)),
cum = purrr::map(data, ~ cumsum(.$n)),
ma7 = purrr::map(data, ~ ma7(.$n))) %>%
tidyr::unnest() %>%
dplyr::left_join(prefs, ., by = c("都道府県" = "Pref")) %>%
dplyr::mutate(Pref = forcats::fct_inorder(`都道府県`)) %>%
dplyr::select(date, Pref, n, diff, cum, ma7) %>%
dplyr::arrange(date)
x_by_prefs
start <- df_s$prefectures %>%
dplyr::select(pref = name, date = dailyDeceasedStartDate) %>%
dplyr::left_join(prefs, by = c("pref" = "pref")) %>%
dplyr::arrange(pcode) %>%
tidyr::drop_na(pcode) %>%
dplyr::select(date, Pref = `都道府県`) %>%
dplyr::distinct(date) %>%
.$date %>% lubridate::as_date()
d_by_prefs <- df_s$prefectures %>%
dplyr::select(deceased = dailyDeceasedCount, pref = name) %>%
dplyr::left_join(prefs, by = c("pref" = "pref")) %>%
tidyr::drop_na(pcode) %>%
dplyr::select(Pref = `都道府県`, deceased) %>%
tidyr::unnest(deceased) %>%
tidyr::pivot_wider(names_from = Pref, values_from = deceased) %>%
tidyr::unnest() %>%
dplyr::mutate(date = seq.Date(from = start, to = start + nrow(.) - 1,
by = "day")) %>%
dplyr::select(date, dplyr::everything()) %>%
tidyr::pivot_longer(col = -date, names_to = "Pref", values_to = "n") %>%
dplyr::group_by(Pref) %>%
tidyr::nest() %>%
dplyr::mutate(diff = purrr::map(data, ~ lagdiff(.$n)),
cum = purrr::map(data, ~ cumsum(.$n)),
ma7 = purrr::map(data, ~ ma7(.$n))) %>%
tidyr::unnest() %>%
dplyr::left_join(prefs, ., by = c("都道府県" = "Pref")) %>%
dplyr::mutate(Pref = forcats::fct_inorder(`都道府県`)) %>%
dplyr::select(date, Pref, n, diff, cum, ma7) %>%
dplyr::arrange(date)
d_by_prefs
集計データ$regionsは、合計が一致しないことがあったので、$prefecturesのデータを元に計算する。
d_by_region <- d_by_prefs %>%
dplyr::select(date, pref = Pref, n) %>%
dplyr::left_join(prefs, by = c("pref" = "都道府県")) %>%
tidyr::drop_na(pcode) %>%
dplyr::group_by(date, `八地方区分`) %>%
dplyr::summarise(n = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::rename(region = `八地方区分`) %>%
dplyr::group_by(region) %>%
tidyr::nest() %>%
dplyr::mutate(diff = purrr::map(data, ~ lagdiff(.$n)),
cum = purrr::map(data, ~ cumsum(.$n)),
ma7 = purrr::map(data, ~ ma7(.$n))) %>%
tidyr::unnest() %>%
dplyr::arrange(date)
d_by_region
sec_scale <- 100
datetime <- lubridate::as_datetime(df_s$updated, tz = "Japan")
x_by_all %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n), stat = "identity", width = 1.0,
alpha = 0.5) +
ggplot2::geom_line(ggplot2::aes(y = ma7), linetype = "dotted", size = 0.5) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale), colour = "dark green") +
ggplot2::labs(title = paste0("@", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "陽性者数(単日)・移動平均(点線)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積陽性者数(折線)")
)
x_by_all %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_line(ggplot2::aes(y = diff), colour = "dark green", alpha = 0.5) +
ggplot2::labs(title = paste0("@", datetime), caption = caption,
x = "", y = "陽性者数前日差")
x_by_region %>%
ggplot2::ggplot(ggplot2::aes(x = date, y = n)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = region), stat = "identity",
width = 1.0, alpha = 0.5) +
ggplot2::labs(title = paste0("単日 @", datetime), caption = caption,
x = "", y = "陽性者数[人]")
x_by_region %>%
ggplot2::ggplot(ggplot2::aes(x = date, y = ma7, colour = region)) +
ggplot2::geom_line(size = 1) +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("7日間移動平均 @", datetime), caption = caption,
x = "", y = "陽性者数[人]") +
ggrepel::geom_text_repel(ggplot2::aes(label = region),
data = subset(x_by_region, date == max(date)),
nudge_x = 30, segment.alpha = 0.5, size = 3) +
ggplot2::lims(x = c(min(x_by_region$date),
max(x_by_region$date) + 45))
x_by_region %>%
ggplot2::ggplot(ggplot2::aes(x = date, y = cum, colour = region)) +
ggplot2::geom_line() +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("累積陽性者数@", datetime), caption = caption,
x = "", y = "累積[人]") +
ggrepel::geom_text_repel(ggplot2::aes(label = region),
data = subset(x_by_region, date == max(date)))
sec_scale <- 50
datetime <- lubridate::as_datetime(df_s$updated, tz = "Japan")
x_by_region %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = region), stat = "identity",
alpha = 0.5, width = 1.0) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = region)) +
ggplot2::facet_wrap(~ region) +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("Fixed scale @", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "陽性者数(単日)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積陽性者数(折線)")
)
x_by_region %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = region), stat = "identity",
alpha = 0.5, width = 1.0) +
ggplot2::geom_line(ggplot2::aes(y = ma7, colour = region),
linetype = "solid", size = 0.25) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = region)) +
ggplot2::facet_wrap(~ region, scales = "free_y") +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("Free Y scale @", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "陽性者数(単日)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積陽性者数(折線)")
)
x_by_region %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_line(ggplot2::aes(y = diff, colour = region)) +
ggplot2::facet_wrap(~ region, scales = "free_y") +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("陽性者数前日差, Free Y scale @", datetime),
caption = caption, x = "", y = "")
sec_scale <- 100
ncol <- 5
datetime <- lubridate::as_datetime(df_s$updated, tz = "Japan")
x_by_prefs %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = Pref), stat = "identity",
alpha = 0.25, width = 1.0) +
ggplot2::geom_line(ggplot2::aes(y = ma7, colour = Pref),
linetype = "solid", size = 0.25) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = Pref)) +
ggplot2::facet_wrap(~ Pref, ncol = ncol) +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("Fixed scale @", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "陽性者数(単日)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積陽性者数(折線)")
)
x_by_prefs %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = Pref), stat = "identity",
alpha = 0.35, width = 1.0) +
ggplot2::geom_line(ggplot2::aes(y = ma7, colour = Pref),
linetype = "solid", size = 0.25) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = Pref)) +
ggplot2::facet_wrap(~ Pref, ncol = ncol, scales = "free_y") +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("Free Y scale @", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "陽性者数(単日)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積陽性者数(折線)")
)
x_by_prefs %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_line(ggplot2::aes(y = diff, colour = Pref)) +
ggplot2::facet_wrap(~ Pref, ncol = ncol, scales = "free_y") +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("陽性者数前日差, Free Y scale @", datetime),
x = "", y = "")
sec_scale <- 50
ncol <- 5
datetime <- lubridate::as_datetime(df_s$updated, tz = "Japan")
d_by_region %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = region), stat = "identity",
alpha = 0.25, width = 1.0) +
ggplot2::geom_line(ggplot2::aes(y = ma7, colour = region),
linetype = "solid", size = 0.2) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = region)) +
ggplot2::facet_wrap(~ region, ncol = ncol) +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("Fixed scale @", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "死亡者数(単日)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積死亡者数(折線)")
)
d_by_region %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = region), stat = "identity",
alpha = 0.25, width = 1.0) +
ggplot2::geom_line(ggplot2::aes(y = ma7, colour = region),
linetype = "solid", size = 0.2) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = region)) +
ggplot2::facet_wrap(~ region, ncol = ncol, scales = "free_y") +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("Free Y scale @", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "死亡者数(単日)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積死亡者数(折線)")
)
sec_scale <- 10
ncol <- 5
datetime <- lubridate::as_datetime(df_s$updated, tz = "Japan")
d_by_prefs %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = Pref), stat = "identity",
alpha = 0.25, width = 1.0) +
ggplot2::geom_line(ggplot2::aes(y = ma7, colour = Pref),
linetype = "solid", size = 0.25) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = Pref)) +
ggplot2::facet_wrap(~ Pref, ncol = ncol) +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("Fixed scale @", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "死亡者数(単日)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積死亡者数(折線)")
)
d_by_prefs %>%
ggplot2::ggplot(ggplot2::aes(x = date)) +
ggplot2::geom_bar(ggplot2::aes(y = n, fill = Pref), stat = "identity",
alpha = 0.35, width = 1.0) +
ggplot2::geom_line(ggplot2::aes(y = ma7, colour = Pref),
linetype = "solid", size = 0.25) +
ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = Pref)) +
ggplot2::facet_wrap(~ Pref, ncol = ncol, scales = "free_y") +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = paste0("Free Y scale @", datetime), caption = caption,
x = "", y = "") +
ggplot2::scale_y_continuous(
name = "死亡者数(単日)",
sec.axis = ggplot2::sec_axis(~ . * sec_scale,
name = "累積死亡者数(折線)")
)
x %>%
dplyr::group_by(Pref) %>%
dplyr::summarise(n = n()) %>%
dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>%
dplyr::select(Pref, n, population = `推計人口`) %>%
dplyr::mutate(rate = round(n / population, 2)) %>%
ggplot2::ggplot(ggplot2::aes(x = population, y = n) ) +
ggplot2::geom_point(ggplot2::aes(colour = Pref)) +
ggrepel::geom_text_repel(ggplot2::aes(label = Pref, colour = Pref)) +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = "", caption = caption,
x = "推計人口[千人]", y = "累計陽性者数")
x %>%
dplyr::group_by(Pref) %>%
dplyr::summarise(n = n()) %>%
dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>%
dplyr::select(Pref, n, population = `推計人口`) %>%
dplyr::mutate(rate = round(n / population, 2)) %>%
dplyr::filter(n < 5000) %>%
# dplyr::slice_min(order_by = n, n = 38) %>%
ggplot2::ggplot(ggplot2::aes(x = population, y = n) ) +
ggplot2::geom_point(ggplot2::aes(colour = Pref)) +
ggrepel::geom_text_repel(ggplot2::aes(label = Pref, colour = Pref)) +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = "累計陽性者数千人未満", caption = caption,
x = "推計人口[五千人]", y = "累計陽性者数")
region <- prefs %>%
dplyr::group_by(`八地方区分`) %>%
dplyr::summarise(population = sum(`推計人口`)) %>%
dplyr::rename(region = `八地方区分`)
x %>%
dplyr::group_by(region) %>%
dplyr::summarise(n = n()) %>%
dplyr::left_join(region, by = c("region" = "region")) %>%
dplyr::select(region, n, population) %>%
dplyr::mutate(rate = round(n / population, 2)) %>%
ggplot2::ggplot(ggplot2::aes(x = population, y = n) ) +
ggplot2::geom_point(ggplot2::aes(colour = region)) +
ggrepel::geom_text_repel(ggplot2::aes(label = region, colour = region)) +
ggplot2::theme(legend.position = 'none') +
ggplot2::labs(title = "", caption = caption,
x = "推計人口[千人]", y = "累計陽性者数")
ts_tmp <- x_by_all %>%
dplyr::select(n)
ts_day <- ts_tmp %>%
ts(frequency = 1)
ts_week <- ts_tmp %>%
ts(frequency = 7)
ts_month <- ts_tmp %>%
ts(frequency = 28)
時系列データに変換したものをプロットすると可視化の項でプロットした棒グラフと同じ形のグラフになることが分かります。
ts_day %>%
plot()
ts_week %>%
plot()
ts_month %>%
plot()
上記からトレンド(長期的傾向)を除いたグラフ。デフォルト指定なのでlag = 1。つまり、前日差。
ts_day %>%
base::diff() %>%
plot()
ts_week %>%
base::diff() %>%
plot()
ts_month %>%
base::diff() %>%
plot()
トレンド、季節変動(周期変動)、非周期変動に分解した場合。frequency = 1では分解できない点に注意。
ts_week %>%
stats::decompose() %>%
plot()
ts_month %>%
stats::decompose() %>%
plot()
トレンドだけを抜き出してみる。移動平均に酷似している。
ts_week %>%
stats::decompose() %>%
.$x %>%
plot(ylim = c(0, 1500))
par(new = TRUE)
ts_week %>%
stats::decompose() %>%
.$trend %>%
plot(ylim = c(0, 1500), col = "dark green", lwd = 3)
ts_month %>%
stats::decompose() %>%
.$x %>%
plot(ylim = c(0, 1500))
par(new = TRUE)
ts_month %>%
stats::decompose() %>%
.$trend %>%
plot(ylim = c(0, 1500), col = "dark green", lwd = 3)
x_by_region %>%
dplyr::select(region, n) %>%
split(.$region) %>%
purrr::map(., ~ ts(.$n, frequency = 7)) %>%
purrr::map2(., names(.),
function(.x, region) {
plot(.x, main = region)
} )
## $北海道地方
## NULL
##
## $東北地方
## NULL
##
## $関東地方
## NULL
##
## $中部地方
## NULL
##
## $近畿地方
## NULL
##
## $中国地方
## NULL
##
## $四国地方
## NULL
##
## $九州地方
## NULL
oldpar <- par()
par(mfrow=c(4, 2))
x_by_region %>%
dplyr::select(region, n) %>%
split(.$region) %>%
purrr::map(., ~ ts(.$n, frequency = 7)) %>%
purrr::map2(., names(.),
function(.x, region) {
plot(.x, main = region, ylim = c(0, max(.x)))
# plot(.x, main = region)
par(new = TRUE)
stats::decompose(.x) %>%
.$trend %>%
plot(ylim = c(0, max(.x)), col = "dark green", lwd = 2)
# plot(col = "dark green", lwd = 2)
} )
## $北海道地方
## NULL
##
## $東北地方
## NULL
##
## $関東地方
## NULL
##
## $中部地方
## NULL
##
## $近畿地方
## NULL
##
## $中国地方
## NULL
##
## $四国地方
## NULL
##
## $九州地方
## NULL
par(oldpar)
x_by_region %>%
dplyr::select(region, n) %>%
split(.$region) %>%
purrr::map(., ~ ts(.$n, frequency = 7)) %>%
purrr::map(., forecast::auto.arima) %>%
purrr::map(., forecast::forecast) %>%
purrr::map2(., names(.),
function(.x, region) {
plot(.x, main = region)
} )
## $北海道地方
## $北海道地方$mean
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## [1] 114.8986 111.4334 119.3303 113.9630 114.3875 125.2964 124.2167 123.6029
## [9] 123.7665 124.3513 124.6239 125.1279 125.4604 125.9200
##
## $北海道地方$lower
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 106.4175 101.92786
## 43.57143 102.1894 97.29593
## 43.71429 108.8158 103.24975
## 43.85714 102.7054 96.74594
## 44.00000 102.1670 95.69787
## 44.14286 112.3704 105.52781
## 44.28571 110.4919 103.22645
## 44.42857 108.4447 100.42044
## 44.57143 107.6565 99.12842
## 44.71429 107.2876 98.25471
## 44.85714 106.6961 97.20564
## 45.00000 106.3481 96.40671
## 45.14286 105.8853 95.52285
## 45.28571 105.5667 94.79227
##
## $北海道地方$upper
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 123.3797 127.8694
## 43.57143 120.6774 125.5709
## 43.71429 129.8448 135.4108
## 43.85714 125.2207 131.1801
## 44.00000 126.6080 133.0771
## 44.14286 138.2224 145.0650
## 44.28571 137.9415 145.2070
## 44.42857 138.7610 146.7853
## 44.57143 139.8764 148.4045
## 44.71429 141.4149 150.4478
## 44.85714 142.5518 152.0422
## 45.00000 143.9076 153.8490
## 45.14286 145.0355 155.3980
## 45.28571 146.2733 157.0477
##
##
## $東北地方
## $東北地方$mean
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## [1] 29.52341 26.49066 32.19864 23.70527 28.42793 31.82418 29.13804 28.37427
## [9] 30.37628 27.57628 30.03589 29.99012 28.64433 30.94531
##
## $東北地方$lower
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 23.06896 19.65218
## 43.57143 19.89011 16.39599
## 43.71429 25.45515 21.88537
## 43.85714 16.82182 13.17794
## 44.00000 21.40731 17.69081
## 44.14286 24.66900 20.88128
## 44.28571 21.85080 17.99317
## 44.42857 20.69942 16.63661
## 44.57143 22.53626 18.38600
## 44.71429 19.57450 15.33861
## 44.85714 21.87554 17.55572
## 45.00000 21.67424 17.27208
## 45.14286 20.17577 15.69278
## 45.28571 22.32677 17.76439
##
## $東北地方$upper
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 35.97787 39.39465
## 43.57143 33.09122 36.58534
## 43.71429 38.94212 42.51191
## 43.85714 30.58872 34.23260
## 44.00000 35.44856 39.16506
## 44.14286 38.97935 42.76707
## 44.28571 36.42528 40.28291
## 44.42857 36.04911 40.11193
## 44.57143 38.21630 42.36655
## 44.71429 35.57807 39.81396
## 44.85714 38.19623 42.51606
## 45.00000 38.30600 42.70816
## 45.14286 37.11289 41.59588
## 45.28571 39.56385 44.12622
##
##
## $関東地方
## $関東地方$mean
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## [1] 481.0509 332.5583 230.8733 376.3721 362.5557 479.3307 477.1103 467.4408
## [9] 330.3481 233.7326 381.4161 368.4719 485.5243 483.3140
##
## $関東地方$lower
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 417.2398 383.46025
## 43.57143 255.1844 214.22516
## 43.71429 146.3109 101.54632
## 43.85714 286.8952 239.52890
## 44.00000 269.2184 219.80853
## 44.14286 382.7288 331.59081
## 44.28571 377.6402 324.98386
## 44.42857 358.7608 301.22918
## 44.57143 215.7523 155.08902
## 44.71429 114.6442 51.60256
## 44.85714 258.5888 193.56803
## 45.00000 242.3811 175.63270
## 45.14286 356.5139 288.21994
## 45.28571 351.6578 281.96328
##
## $関東地方$upper
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 544.8619 578.6415
## 43.57143 409.9322 450.8915
## 43.71429 315.4357 360.2003
## 43.85714 465.8491 513.2154
## 44.00000 455.8931 505.3029
## 44.14286 575.9326 627.0705
## 44.28571 576.5805 629.2368
## 44.42857 576.1207 633.6523
## 44.57143 444.9438 505.6071
## 44.71429 352.8210 415.8626
## 44.85714 504.2433 569.2641
## 45.00000 494.5627 561.3112
## 45.14286 614.5347 682.8287
## 45.28571 614.9702 684.6647
##
##
## $中部地方
## $中部地方$mean
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## [1] 108.2969 123.3338 96.4281 116.6071 130.5514 113.5229 141.2756 137.5016
## [9] 114.2121 136.8832 124.0976 133.4757 148.3087 131.0136
##
## $中部地方$lower
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 90.87744 81.65614
## 43.57143 101.23366 89.53458
## 43.71429 69.39282 55.08121
## 43.85714 84.99494 68.26046
## 44.00000 96.22337 78.05120
## 44.14286 75.27994 55.03536
## 44.28571 100.27686 78.57344
## 44.42857 92.05110 67.99105
## 44.57143 64.18101 37.69614
## 44.71429 83.58122 55.36488
## 44.85714 66.90468 36.62858
## 45.00000 72.95338 40.91479
## 45.14286 84.90143 51.33565
## 45.28571 64.20491 28.83852
##
## $中部地方$upper
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 125.7164 134.9377
## 43.57143 145.4339 157.1330
## 43.71429 123.4634 137.7750
## 43.85714 148.2193 164.9538
## 44.00000 164.8794 183.0516
## 44.14286 151.7658 172.0104
## 44.28571 182.2743 203.9778
## 44.42857 182.9521 207.0122
## 44.57143 164.2432 190.7281
## 44.71429 190.1851 218.4014
## 44.85714 181.2905 211.5666
## 45.00000 193.9981 226.0366
## 45.14286 211.7160 245.2818
## 45.28571 197.8223 233.1887
##
##
## $近畿地方
## $近畿地方$mean
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## [1] 242.5964 218.3115 161.0967 259.0036 205.3289 246.0516 280.8106 258.1282
## [9] 230.9067 183.5240 261.2406 228.8405 251.8436 276.3661
##
## $近畿地方$lower
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 211.6222 195.22548
## 43.57143 182.6528 163.77630
## 43.71429 123.4304 103.49109
## 43.85714 218.2638 196.69741
## 44.00000 162.7587 140.22340
## 44.14286 201.2271 177.49841
## 44.28571 234.3675 209.78206
## 44.42857 203.0379 173.87485
## 44.57143 171.8482 140.58444
## 44.71429 121.6132 88.83958
## 44.85714 196.2923 161.91076
## 45.00000 161.4615 125.79329
## 45.14286 182.0255 145.06608
## 45.28571 204.4699 166.41034
##
## $近畿地方$upper
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 273.5706 289.9673
## 43.57143 253.9701 272.8467
## 43.71429 198.7630 218.7024
## 43.85714 299.7435 321.3099
## 44.00000 247.8991 270.4344
## 44.14286 290.8761 314.6047
## 44.28571 327.2536 351.8391
## 44.42857 313.2184 342.3815
## 44.57143 289.9653 321.2290
## 44.71429 245.4349 278.2085
## 44.85714 326.1889 360.5705
## 45.00000 296.2194 331.8877
## 45.14286 321.6617 358.6211
## 45.28571 348.2622 386.3218
##
##
## $中国地方
## $中国地方$mean
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## [1] 11.98808 11.79483 11.79483 11.79483 11.79483 11.79483 11.79483 11.79483
## [9] 11.79483 11.79483 11.79483 11.79483 11.79483 11.79483
##
## $中国地方$lower
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 2.9391317 -1.851093
## 43.57143 2.3397557 -2.665458
## 43.71429 2.1516664 -2.953115
## 43.85714 1.9671763 -3.235269
## 44.00000 1.7860862 -3.512222
## 44.14286 1.6082150 -3.784253
## 44.28571 1.4333967 -4.051614
## 44.42857 1.2614795 -4.314539
## 44.57143 1.0923234 -4.573241
## 44.71429 0.9257997 -4.827917
## 44.85714 0.7617890 -5.078749
## 45.00000 0.6001809 -5.325908
## 45.14286 0.4408729 -5.569548
## 45.28571 0.2837694 -5.809817
##
## $中国地方$upper
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 21.03703 25.82726
## 43.57143 21.24990 26.25511
## 43.71429 21.43799 26.54277
## 43.85714 21.62248 26.82493
## 44.00000 21.80357 27.10188
## 44.14286 21.98144 27.37391
## 44.28571 22.15626 27.64127
## 44.42857 22.32818 27.90420
## 44.57143 22.49733 28.16290
## 44.71429 22.66386 28.41757
## 44.85714 22.82787 28.66841
## 45.00000 22.98948 28.91556
## 45.14286 23.14878 29.15921
## 45.28571 23.30589 29.39947
##
##
## $四国地方
## $四国地方$mean
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## [1] 1.898957 1.289666 1.289666 1.289666 1.289666 1.289666 1.289666 1.289666
## [9] 1.289666 1.289666 1.289666 1.289666 1.289666 1.289666
##
## $四国地方$lower
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 -1.595193 -3.444885
## 43.57143 -2.463643 -4.450525
## 43.71429 -2.534870 -4.559458
## 43.85714 -2.604795 -4.666398
## 44.00000 -2.673486 -4.771453
## 44.14286 -2.741007 -4.874717
## 44.28571 -2.807415 -4.976279
## 44.42857 -2.872764 -5.076222
## 44.57143 -2.937103 -5.174619
## 44.71429 -3.000477 -5.271541
## 44.85714 -3.062928 -5.367052
## 45.00000 -3.124496 -5.461212
## 45.14286 -3.185217 -5.554076
## 45.28571 -3.245124 -5.645697
##
## $四国地方$upper
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 5.393107 7.242798
## 43.57143 5.042975 7.029856
## 43.71429 5.114202 7.138789
## 43.85714 5.184127 7.245730
## 44.00000 5.252818 7.350784
## 44.14286 5.320339 7.454048
## 44.28571 5.386747 7.555611
## 44.42857 5.452096 7.655553
## 44.57143 5.516435 7.753951
## 44.71429 5.579808 7.850873
## 44.85714 5.642260 7.946384
## 45.00000 5.703827 8.040543
## 45.14286 5.764548 8.133408
## 45.28571 5.824456 8.225029
##
##
## $九州地方
## $九州地方$mean
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## [1] 58.47250 57.25623 50.56438 61.06459 64.47035 68.91864 68.74499 68.39265
## [9] 68.14981 64.12006 70.61634 70.13374 74.11701 74.17680
##
## $九州地方$lower
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 36.403044 24.720179
## 43.57143 30.532422 16.385694
## 43.71429 20.654083 4.820529
## 43.85714 30.192142 13.849252
## 44.00000 31.509093 14.060459
## 44.14286 31.738182 12.056037
## 44.28571 27.777580 6.090744
## 44.42857 21.209903 -3.767133
## 44.57143 16.324600 -11.110010
## 44.71429 8.209567 -21.387662
## 44.85714 11.451222 -19.868900
## 45.00000 7.534443 -25.603623
## 45.14286 7.751472 -27.380319
## 45.28571 4.247576 -32.770719
##
## $九州地方$upper
## Time Series:
## Start = c(43, 4)
## End = c(45, 3)
## Frequency = 7
## 80% 95%
## 43.42857 80.54196 92.22482
## 43.57143 83.98003 98.12676
## 43.71429 80.47468 96.30823
## 43.85714 91.93705 108.27994
## 44.00000 97.43160 114.88023
## 44.14286 106.09910 125.78124
## 44.28571 109.71240 131.39923
## 44.42857 115.57539 140.55243
## 44.57143 119.97502 147.40963
## 44.71429 120.03056 149.62779
## 44.85714 129.78146 161.10159
## 45.00000 132.73303 165.87110
## 45.14286 140.48254 175.61433
## 45.28571 144.10602 181.12432